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Abstract 

Urban systems present hierarchical structures at many different scales. These are observed as admin¬ 
istrative regional delimitations which are the outcome of complex geographical, political and historical 
processes which leave almost indelible footprints on infrastructure such as the street network. In this 
work we uncover a set of hierarchies in Britain at different scales using percolation theory on the street 
network and on its intersections which are the primary points of interaction and urban agglomeration. At 
the larger scales, the observed hierarchical structures can be interpreted as regional fractures of Britain, 
observed in various forms, from natural boundaries, such as National Parks, to regional divisions based on 
social class and wealth such as the well-known North-South divide. At smaller scales, cities are generated 
through recursive percolations on each of the emerging regional clusters. We examine the evolution of the 
morphology of the system as a whole, by measuring the fractal dimension of the clusters at each distance 
threshold in the percolation. We observe that this reaches a maximum plateau at a specific distance. 
The clusters defined at this distance threshold are in excellent correspondence with the boundaries of 
cities recovered from satellite images, and from previous methods using population density. 


1 Introduction 

Countries are the historical outcome of the uni¬ 
fication and the fragmentation of regions and com¬ 
munities. Although many of these processes are the 
result of an imposed organisation devised through 
administrative boundaries, some others hold com¬ 
munities together through strong ideological ties 
at the regional level, creating a strong sense of 
belonging. These processes are intricate cultural, 
political and socio-historical pathways, that have 
left footprints in the way communities emerge, or¬ 
ganise, trade and change spatially. These foot¬ 
prints are contained in the street patterns and net¬ 
work, which are the main proxies for communica¬ 
tion and exchange between settlements. It is hence 
not surprising that these also encompass the socio¬ 
economic history of the region. Despite the con¬ 
stant change and renewal of streets, we show in 
this work, that these footprints can still be identi¬ 
fied and recovered. 

We focus on Britain, whose road network has 
been evolving for over 2000 years. Its origins can 
be traced back to the iron age with the Celts, but 
it was during the Roman occupation that a rapid 


expansion of the roads took place, and a network 
was established. In the last 200 years, this has been 
subjected to ongoing urban growth, and to adap¬ 
tations for new extensions and modes of transport. 
Britain presents a rich regional structure, whose 
tensions lead to a fractured landscape, driven by 
ideologies and socio-historical trajectories. One ex¬ 
ample is the surge for the need to call for a referen¬ 
dum on Scottish independence (2014), and the rise 
of the Scottish Nationalists in the recent elections 
of 2014-2015. Discerning the emergent fractions 
that are independent of administrative provisions, 
but highly tied to their trajectories and ideologies, 
is of considerable relevance for the understanding 
of the dynamical regional functioning of a country. 
In a different paper [1], we corroborate the high 
correlation between regional divisions and polari¬ 
sation represented through voting patterns. 

In this work we investigate whether the regional 
fractures of Britain can be observed through an un¬ 
derlying hierarchical structure which can be recov¬ 
ered from the road network. From the beginnings 
of locational analysis [2], there has been an un¬ 
derstanding that regional organisation can be per¬ 
ceived through movement, and that the expansion 
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of infrastructural networks is tightly linked to re¬ 
gional development. In this sense, the hierarchical 
perspective can be understood from the economic 
performance of regions, leaving traces in the road 
network’s evolution. The quest to characterise and 
quantify the regionalisation of the urban space in a 
hierarchical manner, dates back to the ‘30s. Initial 
ideas focused on subdividing the space, consider¬ 
ing on the one hand morphological characteristics, 
and on the other population distribution. These 
were proposed in various contexts, from regular lat¬ 
tices [3], to elaborated structures, such as the ones 
proposed by Christaller [3] in his Central-Place hi¬ 
erarchies. Later on, efforts were directed into the 
hierarchical aspect of distance between settlements 
1316], where a direct correlation between the size 
of a cluster and the distance to other clusters was 
identified [Tj. This was extended in the 60’s, to 
a perspective on the functional characterisation of 
clusters according to size. A hierarchy emerges 
with respect to the types of relationships that ex¬ 
ist given the cluster size (whether the cluster is a 
village, a town or a city) [H |9l |T0]. These trans¬ 
late in contemporary urban studies to the study of 
scaling laws mm, where nevertheless, it is as¬ 
sumed that all types emerge, and the hierarchy is 
indirectly linked with the aggregated value of the 
urban measurement, determined by the size of the 
cluster. It seems, that there is a need for a careful 
re-evaluation of the characterisation of urban indi¬ 
cators at various regimes of size. Coming back to 
the hierarchical approach to urban functionalities, 
this has paved the way for different perspectives 
on urban modelling [13 El [13 [13 [HI- Currently, 
network theory has been extremely successful and 
permeates the methodologies employed for many 
different urban models. This approach neverthe¬ 
less, dates back to the 60’s, where graph theory was 
used to establish a hierarchical structure between 
regions, through the introduction of a measure of 
flow between places, from telephone calls to trade 

HH). 

A recurrent aspect in the above mentioned ap¬ 
proaches which decode urban hierarchies is the 
connectivity of the system. This can be explored 
through percolation theory lElEDI, which studies 
how a piece of information (or a disease, or a fire, 
etc.) spreads in space, reaching a critical point at 
which a giant cluster appears. In its most general 
form, the process is defined in an infinite lattice 
and for a random occupation probability. Relax¬ 
ing these constraints, the analysis can be extended 
to finite systems, where the clusters are the out¬ 
come of some thresholding process. Some of these 
systems present a multiplicity of percolation tran¬ 
sitions, revealing a hierarchical organisation. This 
was observed for the brain m, where the percola¬ 
tion process is considered in terms of the connec¬ 


tivity between voxels given by the different stimuli 
thresholds. 

A crude analogy can be drawn between the struc¬ 
ture of the brain and that of an urban system. 
Both consist of highly integrated modules which 
connect to each other at different scales, giving 
rise to a functional system. For the urban system, 
the modules correspond to its cities, and its dif¬ 
ferent regional divisions are a manifestation of its 
inherent hierarchical structure 13122]. We hence 
implement a similar methodology to [2T] on the 
street intersections of Britain, in order to unveil 
its hierarchical organisation. Note that the net¬ 
work has been stripped away, and the percolation 
process is hence applied to the street intersections, 
which correspond to the occupied sites in space, 
connected to each other through proximity only. 
Using the intersection points as a proxy for urban¬ 
isation can be justified from archaeological times. 
In Anglo-Saxon Britain [23, the assembly places 
were defined at the main points of convergence, 
where the relevant interactions took place. In con¬ 
temporary times, it has been argued that the road 
intersections are the essential facilitators for the 
necessary human dynamics that lead to a produc¬ 
tive urban system [23]. In addition, there is also a 
technical bias that we purge ourselves from when 
removing the links of the network. This relates to 
the long-standing problems of digitisation of the 
dataset: faulty topology of the network, missing 
streets, disconnected networks given by the inac¬ 
curacy of streets almost meeting, etc. In any case, 
for the skeptical reader, we also provide a simi¬ 
lar methodology developed directly on the entire 
street network, which has been carefully prepared 
and checked in order to avoid many of the above 
mentioned problems, and we show that the results 
are equally recovered. 

In the following sections we show that through a 
multiplicity of percolation transitions, the hierar¬ 
chical structure of Britain emerges. These tran¬ 
sitions indicate fractures of various sorts, from 
natural barriers, such as National Parks or lakes, 
to socio-economic polarisation such as the North- 
South divide. The transition observed at the small¬ 
est scale defines the cores of the cities. It is 
well-known that the morphological properties of 
cities and regions are notably different. These 
have been extensively analysed for street networks 

[23 E3 [23 [23 EH]) nevertheless, the statistical 

properties previously found cannot be used to de¬ 
fine the boundaries of a city, since there is no clear 
transition between urban and rural networks. Here 
we show that through the analysis of the fractal 
dimension of the emergent clusters, a threshold 
can be identified at which cities are well-defined. 
This specific morphological property observed over 
the whole system, gives a maximum over all the 
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thresholds. At this maximum, the obtained clus¬ 
ters are in very good correspondence with other 
proxies for cities, such as satellite images of the 
urbanised space, and previous definitions of cities 
proposed by the authors [12] . 

2 Methodology and dataset 

Percolation theory is classically approached in 
terms of the probability of a site being occupied in 
a lattice. It can also be thought of in terms of bond 
percolation, in which the sites are all occupied, and 
the probability corresponds to a bond to be open 
and to connect sites. In our analysis, the sites will 
correspond to the intersection points. 

In the following section we present two method¬ 
ologies: 1) the percolation on the intersection 
points, and 2) the percolation on the street net¬ 
work. For both methods we use the most complete 
database for street networks in Britain: the Ord¬ 
nance Survey (OS) MasterMap [30|. For computa¬ 
tional purposes, we reduce the size of the dataset 
by introducing the following simplifications: 1) we 
remove the points that do not convey any morpho¬ 
logical information, such as nodes of degree two, 
which for example correspond to streets changing 
name; 2) we replace roundabouts by a single inter¬ 
section point, which is primarily relevant for the 
methodology on networks. The original dataset 
contains more than 23 million points and the final 
processed dataset contains only around 3 million 
intersection points. In detail, the network cover¬ 
ing the whole of the UK has n = 3, 390, 758 nodes; 
m = 3,973,186 links; and the average node degree 
is < A; >= 2.34. 


tions m- In |34] this algorithm is also employed to 
understand the emergence of regions through per¬ 
colation theory, and in [35| , in order to understand 
the spread of obesity in the United States. It is 
important to note that most of these algorithms 
have been constructed in an effort to define cities 
in a consistent way, and considerable research is 
still undergoing in this direction j36l [37| . These al¬ 
gorithms differ from models of urban growth based 
on correlated percolation [33 [Ml 00], and on cor¬ 
relations with urban sprawl [41j . 

In detail, our algorithm is defined in terms of a 
distance parameter that determines clusters of in¬ 
tersection points in which every point has a neigh¬ 
bour at a distance equal or smaller than the given 
threshold. The algorithm can be implemented on 
the continuous space, or for large datasets requir¬ 
ing computationally demanding calculations, on a 
grid covering the space of points. Please refer to 
the appendix [^ for more detail on the implemen¬ 
tation of the algorithm. 

2.2 Percolation on the network 

In this case, we are considering the ‘real’ net¬ 
work, where intersection points are connected if 
and only if there is a street connecting them. The 
clustering procedure is very similar to the proce¬ 
dure described above, but in this case the distance 
is given by the actual extent of the street. An 
open bond hence corresponds in this case to an 
existing street according to the different distance 
thresholds. And once again, the links can be re¬ 
interpreted in terms of probabilities if the distances 
are normalised. Details on how to implement this 
can be found in the appendix [B| 


2.1 Percolation on the street intersec¬ 
tions 

For this method we take the dataset described 
above, and we remove all the street segments, leav¬ 
ing only the intersection points. We then apply a 
clustering algorithm that corresponds to a thresh¬ 
olding procedure parameterised by distance. This 
is simply defined as the Euclidean distance between 
points, whether they are connected or not. We ob¬ 
serve different configurations of clusters appearing 
at different distances. This procedure can be in¬ 
terpreted in terms of bond percolation as follows: 
the probability of a bond to be open between sites, 
corresponds to the distance between the intersec¬ 
tion points. In this sense, one can think of a fully 
connected network in which the distance between 
nodes gives the probability for the link to exist af¬ 
ter a normalisation procedure. 

In practical terms, the algorithm is similar to the 
CCA (City Clustering Algorithm) [3ll|32] based on 
population distribution in space, and the natural 
cities definition given also in terms of road intersec¬ 


3 Results 


3.1 Urban hierarchies 

We analyse the process in the traditional way, by 
looking at the size evolution of the largest cluster at 
the different distance thresholds [T^. A multiplic¬ 
ity of percolation transitions defining the fractures 
at different scales is observed. The same divisions 
can be found in both systems, see Fig. [^ although 
the critical distance threshold varies. As with any 
percolation process, the critical threshold that de¬ 
termines the transition pertains to the geometrical 
properties of the space. In the first system the 
percolation occurs on a regular grid, while in the 
second system, the process takes place on the net¬ 
work itself. We will present the rest of the results 
for the first scenario, hence all the critical distances 
correspond to those observed for the points of in¬ 
tersection, and are illustrated in Fig. [^ Details of 


the maps can be found in the appendix in Fig. SI 
The first transition detected at dc = 120m de¬ 
notes the merging of the north and south parts of 
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Percolation on the intersection points Percolation on the network 



Figure 1: Evolution of the largest cluster size for the percolation on both systems. The size has been 
normalised by the total number of intersection points present in the dataset. 


London separated by the river Thames. The gi¬ 
ant cluster can be identified with the core of Lon¬ 
don. Looking across the system, other large clus¬ 
ters correspond to the cores of important cities, 
such as Birmingham, Manchester, Liverpool, Bris¬ 
tol, etc. This first transition is therefore represen¬ 
tative for a system of cities. Nevertheless, although 
the cores are recovered, the extent of the cities is 
much smaller than expected. In the next section 
we will introduce a fractal analysis of the clusters 
at each distance threshold, and we will show that 
the system maximises the value of the fractal di¬ 
mension at the highest point of correlation with the 
urbanised space, defined through a classification of 
satellite images. 

The giant cluster at the next transition, dc = 
540m, encompasses the main post-industrial cities 
in the North: Liverpool, Manchester, Leeds and 
Sheffield. These are the main core cities conform¬ 
ing the region denominated as the “Northern Pow¬ 
erhouse” (devolution proposal|^ with some oth¬ 
ers, such as Newcastle missing in the cluster. The 
Northern Powerhouse proposal aims at devolving 
power and at boosting economic growth in cities 
in the North, reducing the gap of wealth between 
these and the cities in the South-East, mainly 
driven by London [45]. Such a disparity in wealth 
distribution dates back to Roman times, and saw 
an intensification after de-industrialisation. Many 
claim that the reforms introduced by Margaret 
Thatcher’s Conservative Governments during the 
80’s made the North-South divide even more sig- 

'^This terminology adopted by the Chancellor of the Ex¬ 
chequer since June 2014 |42], has been incorporated into the 
vocabulary of politics, featuring in the main news agencies 
in the UK [441 144] 


nificant. 

The next transition at dc = 580m, sees an expan¬ 
sion of that region, introducing Birmingham into 
the largest cluster. The following smaller transi¬ 
tions see the annexing of Wales into the cluster at 
dc = 660m, and at dc = 680m, that of Cornwall. 
At d = 740m before the next transition takes place, 
a very clear division between the North-West and 
the South-East can be observed. It is important to 
note at this point that such a division is not the 
outcome of a geographical accident of sorts, say the 
presence of natural barriers such as parks, moun¬ 
tains or rivers. In addition, it is not an artifice of 
taking only the intersection points, since the split is 
also present when the percolation is performed di¬ 
rectly on the network. This split does not seem to 
be linked to the topography of the country, but to 
the division of wealth that we have been discussing 
throughout. We illustrate this in Eig. The map 
on the right shows the 10 largest clusters in colour 
at d = 740m. The map on the left shows model- 
based income estimates for England and Wales at 
the level of Middle Layer Super Output Areas for 
2007/08. These are based on census data, and 
hence Scotland is excluded from this map, since it 
has a different census to England and Wales. The 
black boundaries correspond to European admin¬ 
istrative regional divisions called NUTS2 |46|. The 
dotted lines indicate a clear agreement between the 
clusters obtained from the percolation process, and 
the division of wealth in the country. 

At dc = 760m, England and Wales, with the 
exception of Snowdonia (region of mountains and 
a National Park in Wales), merge into one re¬ 
gion, marking the most striking transition. This 
is followed by smaller transitions resulting from 
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Figure 2; Evolution of the largest cluster size for the percolation on the intersection points. 


the merging of areas with natural barriers. At 
d = 1120m, Scotland can clearly be distinguished 
as a separate region from the rest of England and 
Wales. Note that the split of Scotland from the rest 
of the country is not caused by barriers of topo¬ 
graphic nature either. This division is of a similar 
type to the one encountered earlier, it is the prod¬ 
uct of a historical cultural differentiation, that has 
been imprinted in the evolution of the infrastruc¬ 
ture of the country. The last important transition 
in the system is observed at dc = 1140m, and it 
corresponds to the merging of Scotland with Eng¬ 
land and Wales. 


It can be argued that these results are biased, 
given that the connectivity between points does not 
take into account whether these points can or can¬ 
not be connected through roads, since these were 
removed. To reassure the reader, we perform the 
percolation on the road network and we present the 
results in the appendix. The plots and maps, see 
Eig. confirm the previous analysis, although at 
different critical distances. These are larger, since 


they correspond to the length of the roads individ¬ 
uals need to take to travel from one intersection 
point to another in the urban system. An impor¬ 
tant result of percolation theory, is that the value 
of the critical distances will vary from dataset to 
dataset, and from system to system. We hence do 
not expect to recover the same distances for the 
UK if a different dataset is used, nor the same dis¬ 
tances for different countries. 

The results can be summarised in a crude way, 
by looking at the evolution of the relevant largest 
clusters through a dendogram, see Fig. The hi¬ 
erarchical structure becomes very tangible, and the 
size of the nodes are scaled to represent the size of 
the real clusters. 

3.2 Fractal properties 

Urbanised spaces have specific morphological 
properties that cannot be found in non-urbanised 
areas, and many of these are manifested in the road 
network. We investigate in this section, whether 
the clusters defined at each distance threshold can 
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Figure 3: Maps of England and Wales: right, at percolation distance threshold d = 740m; left: thematic 
map of income with regional divisions given by NUTS2. 


characterise the urban system in a particular mor¬ 
phological manner. We choose the fractal dimen¬ 
sion as the property to analyse, given that this has 
been extensively researched for the morphological 
description of cities ga SSI SH]. In [50] for exam¬ 
ple, a fractal analysis was undertaken to compare 
built-up areas from the Corine dataset m at dif¬ 
ferent built-up densities for 20 European cities. 

This section contains two subsections for the 
analysis of each of the systems, since the fractal 
dimension cannot be computed in the same way 
for point patterns and for a network. 

3.2.1 Clusters of intersection points 

Until recently, the characterisation of the fractal 
structure of a system consisted of a single fractal 
dimension. For cities, this was traditionally ob¬ 
tained through a box counting algorithm. Never¬ 
theless, this measure is extremely sensitive to the 
dataset and the implementation. In addition, it is 
well recognised now, that cities are actually multi¬ 
fractals (52] [53]. These are objects that present dif¬ 
ferent fractal properties at different scales and re¬ 
gions gl]) and hence cannot be fully characterised 
by a single fractal dimension, but need a spectrum 
of fractal dimensions |55l[56l|57]. 

From the total spectrum, we select the three 
well-known fractal dimensions denoted by Dq, Di 
and D 2 ', where Dq is the capacity dimension, and in 
practical terms it corresponds to the box-counting 


measure; Di is the information dimension and it 
can be interpreted as Shannon’s entropy; and D 2 
is the correlation dimension, which is considered 
to be the most accurate one. A quick review of 
these measures can be found in the appendix 
For the specific system at hand, we need to ex¬ 
tract the characteristic fractal dimensions at each 
of the distance thresholds. Nevertheless, the clus¬ 
ters that emerge have all sorts of sizes, from indi¬ 
vidual points, to very large clusters. Since we are 
interested in characterising the urban space, and 
ultimately in recovering the cities of the system, 
which are the landmark of urbanisation, we impose 
a minimum cluster size of 600 intersections for the 
computation of the fractal dimension. In addition, 
we do not compute the fractal properties beyond 
a maximum distance threshold of d = 540m, since 
the percolation method clearly returns regions be¬ 
yond this transition, moving further and further 
away from a configuration of cities. The method¬ 
ology to compute these three dimensions follows 
the same algorithms described in [53j . where fur¬ 
ther details can be found. The results can be seen 
in Fig. The evolution of all the three fractal 
properties of the system, indicates a maximum at 
d = 180m. 


6 


Fractal dimensions for Smin=600 


I 

4/ 

/.i 


S' 

/J 




- °i 

-♦-D. 


i. 




220 260 300 340 380 

Distance threshold 


420 460 500 540 


Figure 5: Fractal spectrum of clusters with a min¬ 
imum size of Sfnin = 600 points obtained from the 
percolation on the intersection points. 


Carefully inspecting the urban system at this 
maximum of its morphological characterisation, we 
observe that the clusters at this distance threshold 
are in excellent correspondence with the classifica¬ 
tion of urbanised space given by the Corine dataset 
which is a classification of the Landsat satel¬ 
lite images. The high level of agreement can be 
seen in the Fig. where the colour of the clusters 
are chosen according to size, and the black contours 
correspond to the classified urbanised areas. 

3.2.2 Clusters of netwforks 

Let us now compute the fractal dimension of the 
clusters of networks that emerge from the perco¬ 
lation on the network. In this case, the fractal 
dimension a of the system is computed in terms 
of the scaling relationship between the mass of the 
clusters and the diameter of the network. The mass 
is given by the number of intersection points N and 
the diameter is denoted by r^ax, leading to 

^ ‘ max V-^/ 


This corresponds to the same methodology im¬ 
plemented in m- Note that for this system we 
need to take a slightly larger maximum distance 
threshold d = 800m to ensure that we are well 
within the cities definition. In addition, in order 
to include as many small settlements as possible 
in the analysis, we use for networks a minimum 
cluster size of Smin = 50 nodes (a node is an inter¬ 
section point), instead of 600. 

For this case the results show a maximum at 
around d = 300m, see Fig. 


Fractal dimension for Smin=50 



Figure 7: Fractal dimension of the whole urban 
system. It is computed on the networks obtained 
at different distance thresholds, using the scaling 
relationship between mass and diameter given by 

Eq.Q. 


And once again we see that the urban system 
defined at this maximum is in excellent correspon¬ 
dence with the definition of cities. Fig. shows 
the clusters at d = 300m, with the contours for the 
classified urbanised areas in black. 

We quantify the level of agreement between the 
clusters obtained through the percolation method 
and the urbanised areas in the Corine dataset 
through a correlation measure. Fig. indicates 
that the maximum of this correlation is also in the 
vicinity of d = 300m. Further details of how this 
measure is computed can be found in the appendix 

El 



Figure 8: Correlation of the clusters from the net¬ 
work percolation with the boundaries of the Corine 
dataset. 

It is important to reiterate that the distance is 
not universal nor uniquely characterised. It is not 
universal, because it depends on the nature of the 
dataset. Hence a distance of d = 300m might suit 
this specific dataset for Britain, but might not suit 
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another dataset, nor another European country. It 
is not uniquely defined, because the maximum cor¬ 
responds to some sort of plateau. Hence any def¬ 
inition in the vicinity of d = 300m would be as 
accurate or as inaccurate as the one for d = 300m. 

4 Conclusions 

Throughout this work we have shown that per¬ 
colation theory can be applied to the street inter¬ 
sections and network, in order to uncover its in¬ 
trinsic hierarchical structure. We argued that such 
an organisation does not only relate to ideologi¬ 
cal or geographical divisions, but it represents a 
socio-economic polarisation of the system. It has 
been extensively discussed that the regional devel¬ 
opment is tightly linked to the development of its 
infrastructure, hence it should not be surprising to 
find these economic patterns reflected in the den¬ 
sity of the road network. 

Regions are formed by settlements which share 
a stronger connectivity among themselves than 
among settlements of other regions. It is there¬ 
fore not surprising to recover historical trajecto¬ 
ries, and existing alliances. In this sense, looking 
at the highly integrated region of the cities in the 
North at d = 540m, it appears that the assimila¬ 
tion of Newcastle within the Northern Powerhouse 
proposal needs to be done with care, so that it is 
not left behind, given its weaker connectivity to 
the rest of the cities in the region. In this sense, 
many regional policies need to consider the strong 
ties that lie within the urban system. 

From the perspective of the methodology, this 
formalism has the advantage that it can be imple¬ 
mented for incomplete datasets. We would have 
to test the robustness of the method with respect 
to the incompleteness of the dataset, but at this 
stage it is clear that the road intersections serve as 
a good proxy for urbanisation [58], and that the 
percolation process on the point patterns recovers 
the hierarchical organisation of the system. It is to 
be expected that the level of detail provided by the 
dataset would affect the level of detail for some of 
the transitions in the system, nevertheless, a hier¬ 
archical sketch can still be recovered. In addition, 
we have extrapolated the method to other spatial 
distributions where data are sparse, such as census 
data from the eleventh century, i.e. data from the 
Domesday Book. 

Finally, this work has also provided a framework 
to define boundaries of cities in a global way, using 
a dataset that is open and not constrained to geo¬ 
graphical units, such as the census data. In previ¬ 
ous work we developed a procedure to define cities 
using population density from the census [l2|. Al¬ 
though successful, this procedure relies very heav¬ 
ily on data only available every 10 years, and on the 


level of granularity of the geographical unit. Nev¬ 
ertheless, it is still a useful method, since through 
the use of commuting data functional areas beyond 
urban cores can be defined, such as metropolitan 
areas. Note that a further refinement of the per¬ 
colation approach can be found in |59| , where each 
city is adjusted to its condensation threshold. 

Future research is needed to understand the 
mechanisms that drive the system to a maximum 
fractal dimension at the point where the cities 
reach their urban extent. 
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Figure 4: Dendogram of the evolution of some of the largest clusters through the percolation process. 
The size is measured according to the number of intersection points. 
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Figure 6: All clusters appearing at the maximum fractal value. Left, for the intersection points at 
d = 180m; right, for the network at d = 300m, and black contours for the Corine dataset. 
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Appendix 

A Algorithm for the percolation on the intersection points 

The algorithm is based on the geographical location of intersections. We consider a pair of intersections 
as connected if they are no more than d meters apart. In order to reduce the computational complexity 
of the procedure, the actual analysis is performed using a grid of squared cells (10 x 10 meters each). 
A cell has one of two values: 1 if at least one intersection is within its area or null if it contains no 
intersections. As the percolation analysis is based on distance, we calculate a distance grid where each 
cell is assigned the distance to the closest cell that contains an intersection. We use this grid in the 
percolation procedure. 

The percolation procedure for a distance d consists of the following steps: 

1. Each cell of the distance grid that has a distance value of d meters or below is marked as 1, otherwise, 
it is marked as null. 

2. A unique identifier is assigned to each continuous set of marked cells. A cell is considered adjacent 
to its four nearest neighbours (i.e., its von Neumann neighbour). 

3. Each intersection is assigned the unique identifier of its containing cell. 

The method is implemented in ESRI ArcMap 10.1 using the following tools: 


• The intersection grid is created using the Points to Raster tool. 

• The distance grid is created using the Euclidian distance tool. 

• The marked cells grid is created using the Raster Calculator tool. 

• The unique identifiers grid is created using the Region Group tool. 

• The unique identifiers are copied to the intersection points using the Extract Values to Points tool. 

B Algorithm for the network based percolation 

Given a graph of the road network, where nodes represent intersections and the weight for each edge 
is the length of the street that connects them and a certain metric threshold (e.g. 5000m) we produce a 
network percolation via the following steps: 


1. We select the link of the graph with the smallest weight (distance), generating a new cluster and 
inserting both its nodes into the cluster. 


2. We keep a first-in first-out queue of nodes to expand, from which we extract a node to continue the 
process. We add both nodes of the link selected in step 1 to this queue. Nodes are only added to 
this queue if they are not already included. 

3. We extract a node from the queue of nodes to explore and if a link departing from that node (not 
yet included in the cluster) is smaller than the threshold, include the link in the cluster and the end 
node of the link in the queue of nodes to explore. 

4. We repeat step 3 until no further node can be expanded (the queue is empty) and if there are links 
left in the graph that do not belong to any cluster, generate a new cluster by choosing the smallest 
available link and repeat from step 1. 


C Details of the clusters at the transitions 

At each of the critical distances defining a transition, a set of clusters appears. There can be thousands 
of these, hence we opt to only visualise the 10 largest ones, setting colours representing a rank size: red 
is the biggest, blue is the second biggest, green the third etc. The rank and colour is illustrated in Fig. 

As discussed in the main text, although the critical distances differ for both systems (the percolation 
on the intersection points and the percolation on the network), see Fig. the results are very similar. 
The maps in Fig. SI and Fig. S2 represent the transitions for the percolation on the intersection points 
and on the network respectively. 
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D Correlation measure of the network percolation clusters and the urban area 

For a given percolation result on the network, we categorise the type of each intersection as either 
being urban or non-urban. We dehne an urban intersection as an intersection that belongs to a cluster 
that is larger than Smin = 50 while the rest of the intersections are considered to be non-urban. 

We use the polygons defined in the Corine dataset as a reference point. We generate a grid of 1km 
per 1km over the whole territory of the UK. For each square of the grid we assign two values, the first 
value is the area of the polygon that corresponds to the Corine cluster that intersects the square of the 
grid, the second value is the mass (the number of intersections) of the percolation cluster that has more 
intersections in the square of the grid. In order to be able to compare both systems we perform two 
types of analysis. The first is a Pearson Product Moment correlation between the values assigned to 
each square of the grid when there is both a cluster from the Corine and a percolation cluster. Fig. 
shows that the highest correlation, for E? > 0.7 is given for the range of distances 300m to 400m. The 
second type is a measure of error comparing both. The procedure is as follows: when the squares of 
the grid do not have both types of clusters, we count the number of squares that have a Corine cluster 
but not a percolation cluster; and also the other way round, we count the number of squares that have 
a percolation cluster but not a Corine cluster. Finally we add both counts to get the total number of 


squares that do not have coincident clusters. The result is given in Fig. S3, and this shows that the total 
number of non-coincident clusters is minimised for d = 300m. 


E Average cluster size and cluster distribution 

A traditional approach to detect transitions in percolation processes, is to look at the average cluster 
size removing the largest component. In order to avoid very small clusters that hold no information with 
respect to the hierarchical structure of the urban system under consideration, we impose a minimum 
cluster size. We select this minimum size to be Smin = 600 intersections, since it gives enough resolution, 
and includes very small settlements. To put this number into context, the number of intersection points 
in large cities is of the order of 10^ and of 10^ for the 30 largest ones. 

Given the multiplicity of transitions arising from 


The results for both methods are given in Fig. S4 


the different merging processes, the curve presents many different peaks. The different sizes of the second 
largest clusters after transitions take place, obscure many of the transitions that take place in the system. 
An overall picture becomes easier to grasp if one looks at the evolution of the largest cluster size, as done 
in Fig. 

Let us now look at the distribution of cluster sizes. We investigate whether these are power law 
distributed. We use the method developed by [Ml) where a power law can be ruled out if p < 0.1. Note 
that p > 0.1 does not guarantee that the distribution follows a power law. We compute the distribution 
for clusters that have at least 1000 points, and we remove in all cases the largest cluster, so that the 
giant cluster is never considered. Given that we have a multiplicity of transitions, the second largest, and 
sometimes the top largest ones, can still be quite large compared to the rest of the clusters, especially 
for large distances. The results, up to distance d = 760m are presented in Fig. The cumulative 


distributions for some of the distances are given in Fig. S6 At the transitions, we note that a power 


law cannot be rejected. For small distances, we observe that around the transition, the cluster sizes are 
power law distributed. There is a clear region of distances after cities formed and the next transition 
occurs, at which the sizes are not distributed as a power law. The merging mechanism leading to the 
multiplicity of percolation transitions translates into a fluctuating exponent of the system. It is important 
to remember, that exponents arising from power laws are always very sensitive to the sample considered 
in the distribution, the number of events etc. In this case, we only consider clusters that have at least 
1000 points and we always remove the largest one. The number of clusters hence vary enormously from 
distance to distance. In addition, one could argue that given the multiplicity of transitions, removing 
the largest cluster still leaves us with very large remaining ones that will be responsible for the next 
transitions; this is most evident for large ds. 

In conclusion, the value of the exponent does not dehne the threshold at which cities are dehned, nor 
whether the system is composed of cities or regions. It is important to recall that an urban system 
obeying a perfect Zipf law will have an exponent around 2 for the cumulative distribution of population 
size. The observed huctuations hence contribute to the debate of whether cities are universally distributed 
according to a Zipf law or not, and certainly tell us that an urban system distributed according to a Zipf 
law does not necessarily represent cities. One might argue that cities distributed according to Zipf’s law 
is a necessary but not a sufficient condition. 
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F Multifractal spectrum, dimensions: Dq, Di and D 2 

In this section we provide a summary of the main mathematical expressions to compute the fractal 
dimensions Dq, Di and D 2 . This is an extract from [5^, where we give a detailed review of the method¬ 
ology employed in order to compute these three measures from the whole multifractal spectrum. In 
summary, a (mono-) fractal has a measure that is homogeneous in space, such that within a small region 

M(e) ~ (2) 

where D is the fractal dimension. For a multifractal, /r is no longer homogeneous, and hence for each 
region i we can define a distribution function Pi(e) of the measure 

m ~ (3) 


where each subdivision of the space i has a value a*. A fractal dimension f{ai) can then be associated for 
the set of regions with the same value. The moments of the distribution function are obtained through 
the function 

Z,(e) = j;Pi(£)»~ (£)-''»> (4) 


The exponent T{q) can be written in terms of the generalised fractal dimension Dq as 

T{q) = qa{q) - f[a{q)] = {q - l)Dq 

where 

q-le^o logio(e) 

defines the whole spectrum for q G (— 00 , 00 ). The 3 fractal dimensions are hence obtained for q G {0,1, 2}. 
For a monofractal, Dq is a constant for all qs. For g = 0, the fractal dimension Dq corresponds to the 
dimension obtained through a box-counting algorithm. For = 1, we get 


(5) 


( 6 ) 


Di = lim 


- Ei Pi logio Pi 


e-s-O — logio(e) 


(7) 


which has a very similar form to Shannon’s entropy, and this is why this is called the information 
dimension. Finally, for g = 2, the dimension D 2 takes the form 


D 2 = lim 


y.p.2 


e-^-O log]^Q(e) 

and this in our case gives the correlations for pairs of intersection points to lie within the same box e. 


( 8 ) 
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Figure SI; Maps of clusters at the transitions for the percolation on the intersection points. Only the 10 
largest clusters have colours following the legend in the hierarchical tree. 
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Figure S2: Maps of clusters at transitions for the network percolation. Only the 10 largest clusters have 
colours following the legend in the hierarchical tree. 


17 





■O 

0) 

18000 

‘q. 


3 

i \ 

16000 

w 

u 

o 

v> 

14000 

go 

12000 

V) » 

o 2! 
<5^ 

10000 

3 O 


o-c 
(0 z 

8000 

oo 

6000 

n o 

4000 

E-Q 


5 >' 
z n 

2000 



o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

o 


m 

o 


o 

in 

o 

in 

o 

lO 

o 


T— 

CO 


CO 


O) 

o 

CJ 

CO 

lO 


Type of squares 

—^ not in Clusters 
not in Corine 
total 


Distance 


Figure S3: Measure of error of concurrency between the network percolation clusters and the urbanised 
clusters according to the Corine dataset. 




Figure S4: Evolution of the average cluster size removing the largest cluster, and including only clusters 
with at least 600 intersections. Top, plot for the percolation on the intersection points; bottom, plot for 
the percolation on the network. 
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using the method by Clauset et al in [60]. If p < 0.1 the power law can be rejected, and these are the red 
crosses. 
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